DICOM TO XML GENERATOR 



BACKGROUND OF THE INVENTION 

1. Field of the Invention 

5 This invention relates to the field of modeling and data representation, and in particular 

to the modeling and representation of medical reports, via the use of DICOM SR relational data. 

2. Description of Related Art 

The Digital Imaging and Communications in Medicine (DICOM) Structured Reporting 
10 (SR) standard, and the SR Documentation Model upon which it is based, improves the 

expressiveness, precision, and comparability of documentation of diagnostic images and 
*iJ waveforms. DICOM SR supports the interchange of expressive compound reports in which the 
u critical features shown by images and waveforms can be denoted unambiguously by the 

observer, indexed, and retrieved selectively by subsequent reviewers. Findings may be expressed 
liS^ by the observer as text, codes, and numeric measurements, or via location coordinates of specific 

regions of interest within images or waveforms, or references to comparison images, sound, 
^ waveforms, curves, and previous report information. The observational and historical findings 
ill recorded by the observer may include any evidence referenced as part of an interpretation 
r I procedure. Thus, DICOM SR supports not only the reporting of diagnostic observations, but the 
2(T capability to document fully the evidence that evoked the observations. This capability provides 

significant new opportunities for large-scale collection of structured data for clinical research, 

training, and outcomes assessment as a routine by-product of diagnostic image and waveform 

interpretation, and facilitates the pooling of structured data for multi-center clinical trials and 

evaluations. 1 

25 The DICOM SR is based on a relational data technology, and has been standardized by 

the National Electrical Manufacturers Association (NEMA). Supplement 23: Structured 
Reporting Storage SOP Classes to the DICOM Standard, published by the DICOM Standards 
Committee, 1300 N. 17 th Street, Rosslyn, VA 22209 USA, and incorporated by reference herein, 
introduces the SR Service-Object Pair (SOP) Classes for transmission and storage of documents 



1 "Clinical Rationale for the SR Documentation Model and the DICOM Structured Reporting (SR) Standard", 
Abstract, W. Dean Bidgood, Jr., © 1999. 
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that describe or refer to any number of images or waveforms or to the specific features that they 
contain. This standard is expected to be adopted by the medical equipment manufacturers and 
providers at large to provide text, image, and waveform content in a structured reporting format. 
Although the DICOM SR standard provides for a consistent reporting and recording 
5 scheme, the use of the information contained in a DICOM SR is limited to DICOM compliant 
applications that can process this information using the DICOM specific format. Application 
developers must be DICOM literate, and a methodology for deploying applications that 
intemperate with other applications outside the DICOM domain has not yet been developed. 

10 In the computer industry, progress has been made in the use of standardized languages 

and methodologies that facilitate the use of information from a variety of sources by a variety of 
;;f applications. A standard language that is widely used for processing content material is the 

"as? 

M World Wide Web Consortium Extensible Markup Language (XML), which is derived from the 
;i| Standard Generalized Markup Language (SGML), and is designed to describe data and its 
lp* structure so that it can be easily transferred over a network and consistently processed by the 
yl receiver. Because XML is used to describe information as well as structure, it is particularly well 
r| suited as a data description language. One of XML's particular strengths is that it allows entire 
!j s industries, academic disciplines, and professional organizations develop sets of Document Type 
y Definitions (DTDs) and Schemas that can serve to standardize the representation of information 
2j£- within those disciplines. Given a set of DTDs and Schemas, content material that is modeled in 
conformance with the DTDs and Schemas can be processed by applications that are developed 
for these DTDs and Schemas. 

A further advantage of the use of XML is the wealth of tools that are available for the 
processing of XML-compatible data. Of particular significance, the "Extensible Stylesheet 
25 Language" (XSL) is a language for expressing stylesheets, and the "XSL Transformations" 
(XSLT) is a language for transforming XML documents into other XML documents, using 
stylesheets. A stylesheet contains a set of template rules, which are used to match a pattern to a 
source document, or "source tree" and, when the appropriate match is found, to instantiate a 
template to a result document, or "result tree". In this manner, XML information that is 
30 structured for one application can be relatively easily transformed into a different structure for 
another application. 
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BRIEF SUMMARY OF THE INVENTION 
Although XML may be considered a relatively new and specialized language, it can be 
expected that more programmers and other computer professionals will be familiar with XML 
than those who are familiar with DICOM. Additionally, it can be expected that more general- 
purpose utilities and applications will be available for use on XML encoded information than 
will be available for use on DICOM SR encoded information. 

An objective of this invention, therefore, is to provide a method and system that facilitate 
the creation of XML representations of DICOM SR representations and associated information. 
A further objective of this invention is to provide a method and system that facilitate the creation 
of XML representations of DICOM SR and other DICOM objects, with minimal information 
loss. A further objective of this invention is to provide a method and system for creating an 
XML representation of DICOM objects that is flexible and extensible. 

These objectives and others are achieved by providing a conversion system that converts 
DICOM SR information from a DICOM-formatted file into an XML representation. By 
providing a mapping between DICOM SR and XML, the DICOM SR content material can be 
more easily processed by application programs that are DICOM-specific, such as medical 
analysis programs, as well as by application programs that are not DICOM-specific, such as 
routine clerical or data-management programs. In a preferred embodiment, a two-phase 
conversion is employed. The DICOM information is parsed and converted directly into a "raw" 
XML data set. Thereafter, the "raw" XML is transformed into a proper XML output form, via an 
XSLT processor. Changes to the desired XML output form can thus be effected via changes in 
the corresponding XSLT stylesheets. 
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BRIEF DESCRIPTION OF THE DRAWINGS 
The invention is explained in further detail, and by way of example, with reference to the 
accompanying drawings wherein: 

FIG. 1 illustrates an example block diagram of a DICOM to XML conversion system in 
5 accordance with this invention. 

FIG. 2 illustrates an example component structure diagram of a DICOM to XML conversion 
system in accordance with this invention. 

FIG. 3 illustrates an example flow diagram for the conversion of a DICOM object into an XML 

representation in accordance with this invention. 
10 FIGs. 4A-C illustrates example XSLT stylesheets for the conversion of raw XML formatted 

information into an XML format that is consistent with DICOM-specific DTDs and Schemas. 
■ J| Throughout the drawings, the same reference numerals indicate similar or corresponding 

v n features or functions. 

l|: DETAILED DESCRIPTION OF THE INVENTION 

As noted above, although applications can be developed that utilize DICOM ! s relational 
q structured reporting scheme directly, it can be expected that the number of programmers and 
\™ other computer professions who are familiar with XML and object-oriented technologies and 

■ fe s techniques will be substantially greater than those who are familiar with DICOM and relational 

□ 

20, technologies and techniques. 

Copending U.S. patent application "UML MODEL AND XML REPRESENTATIONS 
OF DIGITAL IMAGING AND COMMUNICATIONS IN MEDICINE STRUCTURED 
REPORTS (DICOM SR)'\ serial number 09/686,401, filed 10 October 2000 for Alfredo Tirado- 
Ramos, Jingkun Hu, and Yasser alSafadi, Attorney Docket US000268, incorporated by reference 

25 herein, discloses a system and method for transforming the DICOM SR specification into a 

UML (Unified Modeling Language) model to facilitate an understanding of the DICOM SR by 
non-DICOM systems analysts and system designers. The system and method also includes a 
transformation of this UML model into XML Document Type Definitions (DTDs) and XML 
Schemas. The system and method also includes a transformation of a DICOM SR report into a 

30 UML document, and further includes a transformation of the UML document into an XML 
document. Although this system and method is particularly well suited for conveying an 
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understanding of DICOM SR to non-DICOM professionals, and facilitates the development of 
XML application programs, the transformation of DICOM SR reports to XML via a UML 
transformation introduces an intermediate level of abstraction. This additional level of model- 
abstraction may result in a loss of information, because the UML modeling language is primarily 
5 designed to model structures and interactions, not data. 

Concurrently filed U.S. patent application "DICOM XML DTD/SCHEMA 

GENERATOR", serial number , filed for Jingkun Hu and Kwok Pun Lee, 

Attorney Docket USO 10070 and incorporated by reference herein, discloses a system and 
method for transforming the DICOM SR specification directly into XML Document Type 
10 Definitions (DTDs) and XML Schemas, and is expected to further increase the use of XML as 

the language of choice for processing DICOM SRs and other DICOM documents. 
£! This invention is based on the premise that DICOM-related application programs will be 

\l{ developed as XML-enabled applications, and that a variety of existing XML-enabled 
M applications can be used to address clerical and administrative tasks related to the information 
1S| contained in the DICOM reports. 

fJJ FIG. 1 illustrates an example block diagram of a DICOM to XML conversion system 100 

if in accordance with this invention. The conversion system 100 transforms a DICOM input file 
jTj 1 10, such as a DICOM Structured Report (DICOM-SR) into a corresponding XML document 
160. A DICOM parser 120 extracts the attributes from the DICOM input file 1 10, and provides 
2(11 these attributes to an XML builder 130. In the DICOM environment, an attribute is the core data 
conveyance device. Attributes in a diagnostic report, for example, will identify the patient, the 
diagnostician, the procedure used, the particular results found, references to other items, such as 
Xray images, coordinates of items of interest in the reference image, and so on. 

In a preferred embodiment, the XML builder 130 is configured to effect a straightforward 
25 translation of each DICOM attribute, without consideration for the particular format or structure 
required by an application program that is intended to process the DICOM-XML attributes. 
Alternatively, the XML builder 130 may be configured to format the DICOM-XML attributes in 
accordance with a particular set of XML DTDs and Schemas that are designed for use in a 
particular application. By partitioning the XML-conversion from the XML-formatting, the 
30 resultant system is expected to be more flexible and robust than a composite system, consistent 
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with the principles of well structured designs. For ease of reference, the directly-translated 
attributes from the XML builder 130 are herein referred to as "raw" XML data. 

In a preferred embodiment, the raw XML data is processed via an XSLT (Extensible 
Stylesheet Language Transformation) engine 140. The additional advantage of segregating the 
XML-conversion from the XML-formatting is that existing XML-transformation tools and 
techniques can be used to effect the desired output XML format structure. In this preferred use of 
XSLT, the desired output XML format is specified using XSLT stylesheets 150, discussed 
further below. These stylesheets 150 are defined based on DTDs and Schemas that define the 
format used by an application program. If a DICOM-XML standard is adopted for DICOM 
processing applications, then the use of stylesheets 150 that are compatible with this standard 
will allow the DICOM-XML data that is produced by the conversion system 100 to be processed 
by each application that is compatible with the standard. If a variety of DICOM-XML formats 
are defined, a different set of stylesheets 150 can be provided for each format, and thereby 
allowing the use of the same builder 130, regardless of the particular output format. 

FIG. 2 illustrates an example component structure diagram of a DICOM to XML 
conversion system 100 in accordance with this invention. As illustrated, the DICOM-to-XML 
converter 100 calls each of the three processes 120, 130, 140, as required. In a preferred 
embodiment, the DICOM parser 120 accesses any of a variety of conventional DICOM 
"toolkits" that are available commercially, thereby alleviating the development tasks for routine 
DICOM-related processing tasks. For example, the DICOM file 1 10 in FIG. 1 is typically a 
"binary" file having a well-defined encoding scheme. A DICOM toolkit 210 will include the 
utility programs, subprograms, and function calls that facilitate the decoding of this binary data 
into a more convenient form for processing by the parser 120. 

After the DICOM attributes are decoded from the DICOM file 1 10 by the DICOM parser 
120, the DICOM-to-XML converter 100 invokes the XML builder 130 to create XML data 
corresponding to each of the parsed DICOM attributes. 

FIG. 3 illustrates an example flow diagram for the conversion of a DICOM object into an 
XML representation in accordance with this invention. The XML data is identified by a root 
element, at 310; in this example, the root element for the XML data is defined to be "report". 
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Each DICOM attribute is subsequently processed, via the loop 320-370. If a DICOM attribute 
includes other attributes, each of the other attributes are processed recursively within the loop 
320-370. 

If, at 330, the DICOM attribute is a "sequence" ("SQ" in DICOM terminology), an XML 
element is created, at 336, having XML attributes of "CodingScheme", "CodelD", and 
"ValueType", discussed further below. If, at 330, the DICOM attribute is not a sequence, an 
XML element is created, at 332, having XML attributes of "CodingScheme", "CodelD", 
"Value", and "ValueType". The name of the XML element is derived by converting the DICOM 
attribute name, using the rules illustrated at block 340. Upper-case letters are converted to lower- 
case; each blank, hyphen, and slash is replaced with an underscore; and each apostrophe and 
bracket is deleted. 

The XML attributes are defined as illustrated at block 350. All elements have a common 
CodingScheme value, such as "DCMTAG". The DICOM codelD, which was parsed in the 
DICOM parser 120 of FIG. 2, is used as the value of the XML CodelD attribute, and, if the 
element is not a sequence element, the Value attribute is given the DICOM attribute value, 
which was also parsed in the DICOM parser 120. 

The mapping of DICOM attribute data types 390a to the XML ValueType attributes 390b 
is illustrated at 390. DICOM attributes of SS and US type are assigned ValueType "signed short" 
and "unsigned short", respectively; attributes of FL and FD type are assigned ValueType "float"; 
attributes of AT, and UL type are assigned Value Type "unsigned long"; attributes of SL type are 
assigned Value Type "signed long"; and attributes of type SQ are assigned ValueType 
"sequence". All other attribute types are assigned ValueType "string". This mapping is effected 
at block 360, based on the parsed value of the DICOM attribute's data type. 

After conversion of each DICOM attribute to a corresponding XML element, the 
DICOM-to-XML converter 100 invokes the XSLT engine 140, which may be any of a variety of 
commonly available XSLT engines, to provide the desired XML output format, as discussed 
above. The XSLT engine uses a conventional XML parser 220 to facilitate the identification of 
each data item in the raw XML data for subsequent output formatting based on the 
aforementioned stylesheets 150 of FIG. 1. 
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FIGs. 4A-C illustrates example XSLT stylesheets for the conversion of raw XML 
formatted information into an XML format that is consistent with DICOM-specific DTDs and 
Schemas. As noted above, these DTDs and Schemas will have been defined for use by 
applications designed to process DICOM-related material, and particularly DICOM Structured 
Reports (SRs). 

FIG. 4A corresponds to the DICOM-SR XSLT high-level structure stylesheet 410. As 
illustrated in the "include" portion 412 of the stylesheet 410, the high-level structure includes 
references ("href=") to five different stylesheets, "Patient_IE.xsr, "StudyJE.xsl", 
"Series_IE.xsl", "EquipmentJE.xsl", and "DocumentJE.xsl", corresponding to the five 
"Information Entities" (IEs) in the DICOM SR. At 414, the stylesheet 410 calls for a match 
between the root element in the raw XML file and the word "report", which was assigned to the 
root element of the raw XML file at 3 10 in FIG. 3. Upon finding the match, the stylesheet 410 
provides, at portion 416, header information for the XML file, SRDocument, that is being 
created, including the report identification, and the report date. At portion 418, each of the 
Patient, Study, Series, Equipment, and Document templates/stylesheets are called to produce the 
remainder of the SRDocument. 

FIG. 4B corresponds to the aforementioned "Patient_IE.xsl" stylesheet 420 that is 
referenced in the high-level structure stylesheet 410. Each element within the DICOM "Patient" 
IE has a corresponding template for outputting the contents of the element in a particular form. 
For example, the template for placing the patient's name into the XML output file is illustrated in 
FIG. 4B as "patientsnamejemplate" 422; the template for placing the patient's identification 
and birthdate are illustrated, as "patients_id_template" 424 and "patients_birth_date_template" 
426, and so on. As noted above, by using an XSLT engine to create the appropriately formatted 
output based on stylesheets that contain templates for creating the output, different output 
formats can be provided by merely changing the appropriate templates. The "StudyJE.xsl", 
"SeriesJE.xsl", and "EquipmentJE.xsl" stylesheets are similarly encoded, using the appropriate 
calls to templates corresponding to elements within each of these Information Entities. 

FIG. 4C corresponds to the remaining "Document IE.xsl" stylesheet 430. In a preferred 
embodiment the "DocumentJE.xsl" is partitioned into three simpler stylesheets: 
"SR_Document_General_Module.xsl", "SR_Document_Content_Module.xsl", and 
"SOP_Common.xsr, corresponding to the DICOM SR Document General, SR Document 
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Content, and SOP Common modules of the DICOM Document Information Entity. These 
module stylesheets are included in the stylesheet 430, at 432, and each of the templates are 
invoked to provide the appropriately formatted XML output corresponding to the DICOM 
Document IE, at 434. The DICOM SR includes three forms of Information Object Definitions 
(IODs): a basic text SR, an enhanced SR, and a comprehensive SR. Each of these forms of IODs 
is provided by providing separate "SR_Document_Content_Module.xsl" stylesheets. 

The foregoing merely illustrates the principles of the invention. It will thus be 
appreciated that those skilled in the art will be able to devise various arrangements which, 
although not explicitly described or shown herein, embody the principles of the invention and 
are thus within the spirit and scope of the following claims. 
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